When using all the Essentia features, the dendrogram tends to be skewed and asymmetrical, with most of the branches clustering towards the left side. This suggests that the data is concentrated in a specific area, making it harder to assess the similarity between tracks. The resulting over-clustering can make it difficult to interpret the relationships between tracks effectively.
When filtering down to just the features arousal, instrumentalness, and tempo, the dendrogram becomes more symmetrical and clearer. With fewer and more relevant features, the clusters in the dendrogram appear more distinct and balanced, making it easier to differentiate between tracks that share stronger similarities. This reduced complexity allows for a clearer visualization of the data, which aligns better with the perceived musical traits of the tracks
Despite initially perceiving that the two tracks I focused sounded quite similar, the dendrogram shows that they are, in fact, quite distant from each other. The first track is positioned on the far left of the dendrogram, grouped closely with tracks like ties-o-2 and wednesday-w-2. In contrast, the second track is placed more towards the right, clustering with tracks like daniel-p-2 and sarya-h-2.
Track 1
When listening to ties-o-2 and wednesday-w-2, and comparing them to my first track, I initially noticed very little similarity, if not none at all. However, the heatmap reveals that these three tracks share relatively similar values in instrumentalness. Surprisingly, all three have negative values, which I didn’t expect, particularly for my own track. Despite hearing many instrumental sounds, including Vietnamese zither, drums, and other traditional instruments combined throughout the song, the system does not seem to highlight these aspects. This raises concerns about whether non-Western traditional instruments can be effectively captured and represented by the system.
Track 2
When comparing daniel-p-2 and sarya-h-2 with my second track, I recognize a similarity in their tempo, as all three tracks have a relatively slow tempo. Additionally, their song structures are quite simple, which I believe explains why their arousal and tempo values are closely grouped together in the heatmap.
Track 1+2
I am still quite concerned about the system’s ability to accurately extract instrumentalness, as my first track has a negative value, while the second track has a positive one. However, upon listening to both tracks, I noticed that while the second track features flute sounds, it seems to predominantly contain electronic sound effects, unlike track 1, which is more focused on the combination of various traditional instruments. This discrepancy raises questions about how the system differentiates between non-Western traditional instruments and electronic sounds, and whether it can fully capture the nuanced instrumental features present in each track.
Truth
Prediction AI Non-AI
AI 36 17
Non-AI 13 24
The mosaic on the left illustrates the performance of a classifier attempting to distinguish between AI-generated and non-AI-generated tracks. Using k-Nearest Neighbour classifier, the most important features for classifying tracks are: instrumentalness + danceability + tempo. These features give the highest scores for AI-AI (prediction-truth) and non-AI - non-AI (prediction-truth), among the various feature combinations tested. This suggests that these features could be crucial and valuable for identifying whether a track is generated by AI or not.
A study by two researchers from Hungary, Monica Coronel and Anna Irimiás, confirms that music plays an essential supporting role in “destination promotional videos” and “tourism marketing,” stimulating both cognitive and affective responses. Specifically, their research reveals that background music can capture attention, reflect a destination’s characteristics, target specific audiences, highlight cultural identity, elicit emotions, and create ambience.
These findings about the importance of music in tourism marketing led me to explore Vietnamese advertising music and compare it with global music trends. In particular, my research question focuses on:
“How does the musical style of Vietnamese advertising music compare to other music? Does it have distinct characteristics, or does it align with broader global trends?”
To represent Vietnamese advertising music, I selected two tracks suitable for advertising videos showcasing Vietnamese culture and nature. After experimenting with generative AI tools, I opted for royalty-free tracks from Pixabay and SoundCloud. I used keywords such as “Vietnam,” “folk instruments,” “adventurous music,” and “travel” on both platforms, and filtered for “bright” mood and “cinematic music” theme on Pixabay. I chose these tracks because they feature Vietnamese folk instruments—a key focus—and include a strong bass that enhances engagement and evokes emotions in listeners, aligning well with the commercial and storytelling purposes of advertising videos.
To support and contextualize the comparisons with other “global music trends”, I will analyze Vietnamese advertising music alongside three Western music styles observed in the class corpus: rock (lennart-p-2), blue jazz (gijs-s-2), and traditional jazz (jasper-v-1). These genres provide contrasting perspectives on harmony, loudness dynamics, timbre, and rhythmic structure, allowing me to assess whether Vietnamese advertising music exhibits distinctive characteristics or aligns with broader global trends.
This interactive boxplot presents the distribution of various Essentia features extracted from the class corpus. The black points represent all tracks in the dataset, while my tracks are highlighted in pink for better visibility.
My tracks are scattered across different features, showing varying degrees of similarity and uniqueness compared to the “average” track in the corpus:
Arousal, Danceability, Engagingness and Valence: My tracks tend to be closer to the median or remain within the general range of the corpus, suggesting they align with the typical characteristics of the class corpus
Approachability & Instrumentalness: My two tracks are both positioned higher than the median, or toward the higher end of the distribution, indicating they are significantly different from the majority of tracks
Tempo: My tracks are distributed outside the interquartile range (IQR) with the intension to lean toward either the upper or lower quartile, showing that they deviate significantly from the majority of the class corpus with the higher-than-average or lower-than-average speeds
Based on the distribution of my tracks compared to the class corpus, the key insights are:
My tracks are not drastically different in features like danceability and arousal, meaning they share common rhythmic and energetic characteristics with the class corpus
While my tracks generally follow the overall trends in most features, their placement in instrumentalness and approachability suggests a distinct musical approach, likely incorporating many traditional instruments with simpler harmonies and familiar structures to enhance accessibility for a diverse audience
Tempo also show the uniqueness with a track leaning towards fast-paced compositions, while other adopt a slower, more relaxed pacing
This visualization provides a clear comparison of how my tracks align with the broader dataset and which features distinguish them. It confirms that Essentia effectively identifies track characteristics and highlights both similarities and unique elements of my track.
The first chromagram reveals a dynamically structured piece that doesn’t settle on a single tonal center but rather employs a wide array of pitch classes throughout its duration
Broad Pitch Utilization: The entire 12-tone pitch spectrum is active throughout the piece. Bright bands appear across nearly all pitch classes, indicating that the composition does not fixate on one key but instead incorporates chromatic elements or frequent modulations
Recurring Clusters: Noticeable clusters of intense activity at specific time intervals suggest repeated melodic or harmonic motifs, hinting at the use of recurring chord progressions or thematic material
Chroma-based Self-Similarity Matrix
The block-like structures and distinct lines are more apparent, indicating sections of the track where harmonic repetition homogeneity occurs:
Block-like structures: These represent homogeneous musical sections such as verses or choruses
Distinct, sometimes blurred, paths parallel to the main diagonal: These indicate repeated sections occurring at regular time intervals, even if the patterns aren’t perfectly sharp
Timbre-based Self-Similarity Matrix
The block-like structures are less clear. Instead, the streaks are more blurred and evenly distributed, suggesting that there is variability in timbre throughout the track
The absence of distinct parallel diagonal lines may indicate that the track experiences significant changes in instrumentation or arrangement between different sections
Bright areas appearing along the edges and center might represent sections where there are changes in instrumentation or performance style, such as a drop or a solo instrumental segment
These chordograms visualize the harmonic structure of Track 1 and 2, displaying the evolution of chords over time. The Y-axis represents different chords used in the track, including major (maj), minor (min), dominant 7th (7), and diminished chords, while the X-axis represents time in seconds. The color intensity indicates the activation strength or presence probability of each chord at any given moment, with bright yellow signifying strong chord presence and dark purple indicating weaker or less frequent occurrences
Track 1
The chordogram shows a relatively stable harmonic structure throughout the piece, with minimal drastic changes
The pitch material appears to be concentrated in specific regions, particularly around G♭ major, A♭ major, B major, and D♯ minor
The intensity distribution is fairly even, suggesting recurrent harmonic patterns rather than abrupt modulations. This track lacks significant harmonic shifts, indicating a more consistent chord progression and possibly a repetitive structure
Track 2
Unlike Track 1, this track displays more frequent variations in harmonic intensity, which suggests a more dynamic harmonic progression
There are clear moments of discontinuity around 60s, 100s, 160s, and 190s, indicating modulations or transitions between different sections of the piece
The intensity variations across time suggest moments of greater harmonic complexity, potentially due to instrumental improvisation
These keygrams exhibit a more ambiguous structure, with a diverse and less clearly defined focus on specific musical keys throughout the track.
(I plan to analyze this further in the future, as I find some aspects of it quite confusing at the moment :) )
Track 1
- Fourier-based tempogram:
Prominent, stable horizontal lines appear consistently at multiple tempo levels (~120 BPM, 240 BPM, 360 BPM, etc.), clearly indicating a fundamental tempo around 120 BPM, with additional lines representing tempo octaves
The rhythmic structure is highly repetitive and steady, reflecting a clear rhythmic pattern throughout the duration of the track
Minimal tempo variation occurs, with slight exceptions at the opening (0–5s) and the ending section (around 190s onward), implying strong rhythmic stability. I guess that these variations subtly highlight a structured musical form, including an introduction, main body, and conclusion
- Cyclic tempogram: A simplified visualization by wrapping higher harmonics back into the fundamental range
Clearly highlights a stable fundamental tempo around 118 BPM (near 120 BPM as estimated in non-cyclic tempogram)
Noticeable gradual tempo modulations occur, with the tempo starting slower (approximately 90 BPM) and gradually increasing at the beginning, and slightly decreasing towards the end (around 115 BPM)
Track 2
- Fourier-based tempogram:
Presents similarly strong, stable horizontal lines but around multiple tempo levels (~100 BPM, 200 BPM, 300 BPM, etc.)
However, unlike Track 1, this track shows short and subtle rhythmic interruptions around certain moments (around 5s, 40s, 110s, 140s, etc)
- Cyclic tempogram:
Clearly isolates the main fundamental tempo around 100 BPM
Clearer reveal rhythmic variations (seen as vertical lighter lines at multiple time-points) which suggests beat strength’s changes and presents brief musical transitions aligning with what is found when listening the track
-> Based solely on the tempogram analysis of these tracks, Vietnamese advertising background music appears characterized by clear, stable fundamental tempo structures, often accompanied by identifiable harmonic patterns. The rhythmic consistency observed suggests suitability for creating comfortable listening experiences while viewing nature and culture presented in video, essential in promotional contexts. While there are still visible changes presented in tempograms at certain time-points, these are not such huge changes and these changes seem to be mainly due to the changes in instrumentation when carefully listening to the tracks
(I still don’t know how to change the code in order to make all graphs have the same size, if you know, please help meeee! thank you in advance!)